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Evaluation of Uncertain Image Classification and Segmentation 
Algorithms 
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Each year, numerous segmentation and classification algorithms are invented or reused to solve problems where 
machine vision is needed. Generally, the efficiency of these algorithms is compared against the results given by 
one or many human experts. However, in many situations, the location of the real boundaries of the object 
as well as their classes are not known with certainty by the human experts. Moreover, only one aspect of the 
segmentation and classification problem is generally evaluated. In our evaluation method, we take into account 
both the classification and segmentation results as well as the level of certainty given by the experts. As a concrete 
example of our method, we evaluate an automatic seabed characterization algorithm based on sonar images. 



1. INTRODUCTION 

Image classification and segmentation are two 
fundamental problems in image analysis. Seg- 
menting an image consists in dividing the image 
into homogeneous zones delimited by boundaries 
so as to separate the different entities visible in 
the image. Classification consists in labeling the 
various components visible in an image. A great 
deal of segmentation and classification methods 
have been proposed in the last thirty years [3J; 
enumerating them all is not the purpose of our 
paper. However, an important question to solve 
is how to benchmark these methods and evaluate 
their robustness with respect to a given real-life 
application. 

A typical example of the use of classification 
and segmentation is encountered in satellite or 
sonar imaging, where an important use of the 
data is to classify the types of soils present in 
the images, for instance to build maps. As the 
amount of images gathered during a mission is 
important, automatic recognition algorithms can 
relieve human operators. Since the swath of the 
sensor is wide, many types of soils can be en- 
countered within a single image, and the classi- 



fication must be done on a local neighborhood. 
This neighborhood can be either limited to a sin- 
gle pixel, or often to a small tile of e.g. 16 x 16 
or 32 x 32 pixels taken as the unit for the classi- 
fication algorithm. The boundaries between the 
different patches corresponding to a category of 
soil are a form of segmentation, which is here an 
implicit byproduct of the classification. In other 
applications, segmentation can come first so as to 
isolate entities which will be labeled later. 

A difficulty raised in these applications is the 
lack of ground truth which could be used to eval- 
uate the result of the classification. The real ref- 
erence classes must be estimated by human ex- 
perts from the data themselves. However, the im- 
ages are difficult to read since they are corrupted 
by many phenomena and the estimation of the 
classes by the human expert will be highly subjec- 
tive and with a varying level of uncertainty. In the 
case of the automatic seabed classification, which 
we will use as our reference example throughout 
this paper, images are especially hard to interpret 
due to many imperfections J] . To reconstruct the 
image, a huge number of parameters (geometry of 
the device, coordinates of the ship, movements of 
the sonar, etc.) are taken into account, but these 
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data are polluted with a large amount of sensor 
noise. Plus, other phenomena such as multipath 
signal propagation (caused by reflection either on 
the bottom or the surface), speckle, and the pres- 
ence of fauna and flora (e.g. shadows of fishes on 
the sea bottom) , will all augment the difficulty of 
interpretation of the image. Consequently, differ- 
ent experts can propose different classifications of 
the image. Thus, in order to evaluate automatic 
classification, we must take into account this dif- 
ference and the uncertainty of each expert. Fig- 
ure [T] exhibits the differences between the inter- 
pretation and the certainty of two sonar experts 
trying to differentiate the type of sediment (rock, 
cobbles, sand, ripple, silt) or shadow when the in- 
formation is invisible (each color correspond to a 
kind of sediment and the associated certainty of 
the expert for this sediment expressed in term of 
sure, moderately sure and not sure). 



image classification and segmentation taking into 
account the information giving by multiple ex- 
perts and the certainty of the given information. 
Classical evaluations of the classification and seg- 
mentation do not take into account the uncer- 
tain and imprecise labels in the reference image 
provided by an expert. We think that we have 
to consider these kind of labels in our evalua- 
tion approach. In section [5] we show how to in- 
tegrate the expert certainty in confusion matrix 
and so to deduce a good classification rate and 
error classification rate. Moreover, our thesis is 
that global image classification evaluation must 
be made not only by evaluating the classification 
on considered units (with the confusion matrix) 
but also by evaluating, at the same time, the in- 
duced segmentation. In section[3l we propose two 
new distance-based measures in order to evaluate 
well and mis-segmented pixels by taking into ac- 
count both the location of the borders and the 
expert certainty. Note that another important 
criterion to evaluate classification/segmentation 
approaches is the evaluation of the complexity of 
the algorithms [1], but we do not consider it in 
this paper. Finally, our evaluation is illustrated 
in section [5] on real sonar images acquired in a 
real, uncertain environment. 
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Figure 1. Segmentation given by two experts. 
We propose in this article a new approach for 



2. CLASSIFICATION EVALUATION 

Traditional classification systems can usually 
be described as a three-tiered process. First, sig- 
nificant features are extracted from the images 
to classify. These features are widely different, 
depending on the application; they are generally 
described using a small set of abstract numeri- 
cal measures. For example, used features may be 
the local luminance, the texture (described with 
measures such as the entropy, the co-ocurrence 
matrices, etc), the contours (described with their 
length, their orientation, their relative position to 
other contours, etc) [3]. Most of the time, a sec- 
ond stage is necessary to reduce these features, 
because there are too numerous. In the third 
stage of the algorithm, the numerical descrip- 
tors are fed to classification algorithms, which are 
application-independent, such as Support Vec- 
tor Machine [31515] . neural networks [2161718] . k- 
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nearest neighbors [5], etc. The classification al- 
gorithms will decide, depending on their entries, 
which is the class of the image. 

Hence, we have to evaluate these classification 
algorithms in order to compare their robustness 
in a given application. The classical approach is 
based on the confusion matrix and does not take 
into account uncertain labels. We propose here 
a new confusion matrix and good classification 
and error rates taking into account these kind of 
labels and also the inhomogeneous units defined 
forwards. 

The proposed method of evaluation in this sec- 
tion, can be applied for the evaluation of a classi- 
fication algorithm in every domain where uncer- 
tain labels are provided. We do not consider here 
the problem of the learning on uncertain and im- 
precise labels |10|11|12] : the classification can be 
made by this kind of algorithms or others. 

2.1. Classical Evaluation 

The results of one image classification can be 
observed and visually compared to the reality. 
But in order to evaluate a classification algorithm, 
many different configurations and tests must be 
considered. Classification algorithms can yield 
very variable results depending on the sample. 
Generally classification algorithms evaluation is 
conducted by the confusion matrix. Confusion 
matrix is composed by the number cmjj of ele- 
ments from the class i classified in the class j. In 
order to obtained rates, with one more easier to 
interpret, we can normalize this confusion matrix 
by: 



Ncrrij 



E 7 =i cn H 



(1) 



with N the number of considered classes and Ni 
the number of element from the true class i. From 
this normalized confusion matrix a good classifi- 
cation rate vector can be written as: 

GCR i = Ncm ii , (2) 

and an error classification rate vector as: 



ECR, 



if* " 

2 E Ncm n + E 



Ncrriij 
N - 1 



•(3) 



This error classification rate is the mean of the 
two errors corresponding respectively to the el- 
ements from a given class i falsely classified as 
elements of another class (first term) , and to the 
elements classified in a given class j but being 
from another class i (second term). These errors 
are also called errors of first and second kind. We 
do not have to normalize the first term because of 
the normalization of the confusion matrix on the 
rows, but the second term must be normalized 
by the number of rows minus one (because of the 
Ncma term corresponding to the good classifica- 
tion). Note that other error rates can be defined 
(see e.g. [E]). 

We have seen that image classification algo- 
rithms evaluation must be made not only on one 
image but on the whole image database. As a 
trivial consequence, we have to consider a non- 
normalized confusion matrix on each image and 
normalize the sum of the matrix confusion on all 
images of the database. 

2.2. Evaluation with expert information 

Consider a general case where information is 
given by the expert on each pixel and the clas- 
sification algorithm is made on an unit of n x n 
pixels. In such a case, if a n x n tile is consid- 
ered, more than one class can be present (we call 
it patch- worked tile or inhomogeneous unit), and 
the classification algorithm can find only one of 
these class. In order to take into account the last 
example, we consider that if the classification al- 
gorithm finds one of these classes on the tile, the 
algorithm is right in the proportion of this class 
found in the n x n tile and it is false in the pro- 
portion of the other classes in the tile. For in- 
stance, imagine the case where the expert con- 
siders a 16 x 16 tile and declares that 156 given 
pixels belong to class 1, and 100 other pixels be- 
long to class 3. If the classification algorithm finds 
the tile belongs to class 1, the confusion matrix 
will be computed by emu = emu + 156/256 and 
cm.31 = cm.31 + 100/256. Hence the confusion ma- 
trix is not composed of integer numbers and N is 
also not integer, but the sums of column are still 
integer. 

Now consider the case where the expert gives 
the class with a certainty grade. For instance, 
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the operator can be moderately sure of his choice 
when he labels one part of the image as belong- 
ing to a certain class, and be totally doubtful on 
another part of the image. In our classification 
evaluation we must not take these two references 
equally. Indeed, classical confusion matrices im- 
ply that the reality is perfectly known; this, un- 
fortunately, is not the case in many real appli- 
cations. We propose to represent this difference 
of information by different weights corresponding 
to the different certainty grades that are consid- 
ered. For example, if three grades of certainty 
(sure, moderately sure and not sure) are con- 
sidered, we can provide respectively the weights: 
2/3, 1/2 and 1/3. In the confusion matrix, such 
weights could be integrated easily in the general 
sum. If one expert labels a tile as belonging to 
class 1, with a moderate certainty, and if the clas- 
sification algorithm finds the class 1, considering 
the previous given weights, the confusion matrix 
will be updated such as: emu = emu + 1/2. If 
the classification algorithm finds the class 2 on 
the considered tile, the confusion matrix becomes 
cmi2 = cm.12 + 1/2. Hence the sums of column 
are not integer anymore. 

In order to take into account the referenced im- 
ages provided by different experts, we can com- 
pare the classified image with all the expert- 
referenced images. Hence we obtain as many con- 
fusion matrices as experts, and we can simply 
combine them by addition. 

By the simple fact that we add the non- 
normalized confusion matrices, we weight the ob- 
tained results by the image size or the considered 
unit number. 

Consequently, in order to obtain rates, we can 
normalize the obtained confusion matrix with 
equation (p} and calculate the good classification 
rate vector with equation ([2]) and the error classi- 
fication rate vector with equation (|3|). Of course 
these rates are not percentages anymore. For in- 
stance, the good classification rate is not percent- 
age of well classified units anymore, because the 
weights given by the inhomogeneous units or by 
expert certainty are rational. 

In conclusion of this section: the interest of 
these newly obtained confusion matrix, good clas- 
sification rate and error classification rate is that, 



they give a good evaluation of classification tak- 
ing into account the inhomogeneous units and un- 
certainty of the experts. This approach can be 
applied in other applications than image classifi- 
cation, in fact in every domain where we try to 
classify uncertainty elements. 

3. SEGMENTATION EVALUATION 

Segmentation can either be obtained as a 
byproduct of the classification, as shown above, 
or be used as the first step of an image process- 
ing pipeline. Many methods of image segmen- 
tation and edge detection have been proposed 
[1411511311611?] . It is important to be able to 
benchmark these methods and to evaluate their 
robustness; but to do that, measures are needed 
so as to have an objective means to judge the 
quality of the segmentation. No perfect measure 
exists today, and existing measures are not well 
satisfied, this is why we can imagine fusing the 
segmentation evaluation approaches [18) . 

On the one hand the image classification meth- 
ods are evaluated by the confusion matrix. Good 
classification rates and error rates are usually cal- 
culated from this matrix. Note that in order to es- 
tablish the confusion matrix, the real class of the 
considered units of the images need to be known. 
This gives only an evaluation of the classification 
approach on considered units of the image, but 
does not give an evaluation of the produced seg- 
mentation. 

On the other hand, segmentation evaluation 
cannot be made only by visual comparison be- 
tween the initial image and the segmented image. 
Many evaluation approaches have been proposed 
for image segmentation [111611 9 20 21 . We can 
consider two cases: we do not have any a pri- 
ori knowledge of the correct segmentation, and 
we have an a priori knowledge of the correct seg- 
mentation. In the first case, many effectiveness 
measures based on intra-region uniformity, inter- 
region contrast and region shape have been pro- 
posed 1 J. The second case implies to get refer- 
enced images. In a real application, experts must 
manually provide the image segmentation via a 
visual inspection. [Tj gives a review of usual dis- 
crepancy measures based on different distances 
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(sometimes expressed in terms of probability) be- 
tween the segmented-pixel and the referenced- 
pixel. 

Most of the time, only a measure of how many 
pixels are mis-segmented is given. We, on the 
contrary, propose in this article a combined study 
of one well-segmented pixel measure and a mis- 
segmented pixel measure. Indeed, most of the 
time, when a pixel is not mis-segmented, it is 
not necessary well-segmented either. As a con- 
sequence, we can have few mis-segmented pixels 
but also few well-segmented pixels, which means 
that the segmentation is not good overall. 

In order to calculate confusion matrices we 
need the a priori knowledge of the class for each 
pixel or at least for each considered unit of the 
image. Hence, experts have to give referenced 
images, and we can consider to be in the second 
case of segmentation evaluation that we described 
above. 

Before presenting our method of segmentation 
evaluation, we show how we can easily obtain a 
deducted segmentation from an image classifica- 
tion based on the classification on tiles. Next, 
the proposed segmentation evaluation method is 
adapted to every image segmentation and can 
take into account imprecise labels. 

3.1. Deducted segmentation 

Image classification provides an implicit image 
segmentation given by the difference of classes be- 
tween two adjacent tiles. Hence a good image 
classification evaluation should take this segmen- 
tation into account as well. 

First of all, we have to define the boundary 
pixels given by the image classification. We pro- 
pose here to use a very simple approach: we will 
take as boundary pixels, the pixels which neigh- 
bor another class on the right or/and on the bot- 
tom. For instance, on table [T] we give a dummy 
segmented image with two classes given by x and 
• . The classification unit is here 4x4. The 
boundary pixels are underlined. 

Many approaches can be considered in order to 
obtain boundaries without angular points. We 
can consider for instance an interpolation be- 
tween the 4-connexity or 8-connexity points [22] . 
This is not the subject of this paper; the reader 



Table 1 

Example of an obtained segmentation on image 
with two classes given by x and •. 
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should keep in their mind that our segmentation 
evaluation is general and can be applied to all 
image segmentations given by boundary pixels. 

3.2. Segmentation evaluation 

We recall here that in our case, we have an a 
priori knowledge of the correct or approximately 
correct segmentation given by the experts. In 
this case all evaluation approaches are based on 
different distances (or probabilities) between the 
segmented-pixel and the referenced-pixel [1 23 24 1 
and most of the time only one measure of mis- 
segmented pixel is given. We think that it is not 
enough for a precise segmentation evaluation if 
a pixel can be not mis-segmented, and also not 
well-segmented. As we mentioned before, we can 
have few mis-segmented pixels only with few well- 
segmented pixels, and so the segmentation can- 
not be considered right. So we propose a linked 
study of two new measures: one well-segmented 
pixel measure and one mis-segmented pixel mea- 
sure. Moreover these two measures can take into 
account the uncertainty of the expert on the po- 
sition and on the existence of the boundaries if 
this uncertainty can be expressed as a weight. 

3.2.1. Boundary good detection measure 

The well segmented pixel measure is a mea- 
sure of how the boundary is well detected and 
the mis-segmented pixel measure tries to quan- 
tify how many boundaries detected by the al- 
gorithm to benchmark have no physical reality. 
First, we search the minimal distance df e between 
each boundary pixel / found by the algorithm to 
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benchmark, and all the boundary pixels e pro- 
vided by the expert. Hence the pixel e is a func- 
tion of /, and we should note it as et, but in 
order to simplify notations, it is referred as e in 
the rest of paper. We take here an Euclidean dis- 
tance but any other distance can be envisaged. 
The certainty weight of the pixel e given by the 
expert is noted as W e . We define a well-detection 
criteria vector by: 

DCf = exp(-(d/ e * W e ) 2 ) * W e . (4) 

This criteria gives a Gaussian-like distribution of 
weights with a standard deviation given by the 
certainty weights as shown in figure [21 




Figure 2. Distance weight for the well-detection 
criteria. 

The boundary good detection measure is de- 
fined by the normalized well-detection criteria 
given by: 

E r DC f 

WDC = 7 (nn \ ^ w/V» ■ (5) 

(max/ (DC/) *£ e We) 

The normalization is made in order to obtain a 
measure defined between and 1. However, in 
real applications, this criteria remains small even 
for very good boundary detection. So we take 
a = 1/6 in order to accentuate small values. 

This criteria is not completely satisfying be- 
cause it only takes into account the distance from 
the found boundary to the contour provided by 
the expert. However, the reference boundary also 
has a local direction which is another informa- 
tion we want to use. A boundary found by the 
algorithm can come across a boundary given by 
the expert orthogonally: in this case some pixels 



from the found boundary are very near (in terms 
of distance) to pixels from the given boundary 
but we do not want say that is a good detection. 
We propose two ways to consider the direction of 
boundaries. 

In the first one, we count, for a given pixel / 
of the found boundary, how many pixels from the 
found boundary are linked by the minimal dis- 
tance to the same pixel e of the referenced bound- 
ary. This number is noted n e /, e.g. on figure [3] 
we have n e f = 3 for three different /. We redefine 
the well-detection boundary measure by: 



WDC = 




Figure 3. Example of n e f for three given /, the 
found boundary is represented by green square 
and the referenced boundary by black line. 

The problem is that the number n e / does not 
necessarily represent a number of pixels on the 
same boundary and takes well into account only 
the orthogonal direction. However this measure 
gives the best evaluation of the proportion of the 
found boundaries. 

The second method is based on the idea that 
the local direction of the boundary should also be 
taken into account: the direction of the detected 
boundary and the direction of the boundary given 
by the expert should be the same. Now, how does 
one compute the direction of the boundary? Let 
I r denote the reference boundary image given by 
the expert. I r (i, j) — if no boundary is detected 
at pixel (i, j); I r (hj) = W e otherwise, where W e 
is the weight of the pixel boundary e at location 
(i,j) given by the expert. Image I r can be seen 
as a discrete 2-D function on which the gradient 
lf r = [dl r /dx;dl r /dy] can be computed. The 
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gradient has the property to be normal to iso- 
values lines of I r and will therefore be normal to 
the boundaries given by the expert. Similarly, one 
can also compute the gradient lf s of the found 
boundary image. Then, a measure of correspon- 
dence between the directions at pixel can be 
given by the absolute value of the normalized dot 
product between the two gradients vector^ 



BD 



[gVlfJ 

ifrll-lllfs 



(7) 



However, as I r is mostly filled with zeros, the 
gradient will have a negligible value at most lo- 
cations. The farther a pixel is from a boundary 
given by the operator, the lower the gradient at 
that pixel will be, thus yielding a huge impreci- 
sion on the local direction of the image. To solve 
this problem, we used the Gradient Vector Flow 
(GVF), first introduced by Xu and Prince [25] . 
For a boundary image /, the GVF is a vector field 
/ = [u(x, y); v(x, y)] that is computed iteratively 
so as to minimize the following cost function over 
all the boundary image: 



U 



(fi.{u 2 x + ul + v 2 x + v 2 y ) + ... 
+ imi 2 ||5 > -?H 2N ) -dx.dy 



(8) 



where /i is a tunable weight, variables in indices 
denote partial derivation with respect to that 
variable, and if is the gradient of the image as 
defined previously. This cost function was de- 
vised so that on boundaries, where the gradient 
is high (| |7f 1 1 — > oo) the energy remains bounded: 
1 1 if — ~T\ \ must tend to zero if one wishes the inte- 
grand to be minimized. Thus, on boundaries, the 
GVF is equal to the gradient field. On the other 
hand, for pixels far away from an y boundary, the 
gradient will tend toward zero, and the integrand 
will be driven by the term ii.(u 2 +u y +v 2 +Vy). To 
minimize it, the partial derivatives of the vector 
field ~~f must be null, which means that the GVF 
extends the gradient by continuity to zones where 
it would normally be negligible. The GVF is com- 
puted both for the reference image and the image 



1 The notation "." for multiplication is a term by term 
multiplication of the two matrices. 



obtained through segmentation. The measure of 
correspondence between the boundary directions 
will be similar to equation (JT]): 



BD = 



l7r-7, 



IfrlUllfJI 



(9) 



On figure [4] note that the gradient is only 
strong on edges, whereas the GVF is strong ev- 
erywhere, thus enabling the local directions to be 
seen. 
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Figure 4. Computing the direction of the bound- 
aries: gradient (top), GVF (bottom). 



Hence, we can redefine DCf in equation ^ 
by (DC.BD) f , so that we obtain a new measure 
which takes into account the local direction of the 
found boundaries. 
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3.2.2. False detection boundary measure 

The boundary false detection measure is based 
on the same principle than the well-detected 
boundary measure, but the Gaussian-like distri- 
bution of weights must be inversed. Hence we can 
defined a false detection criteria by: 

FDCf = 1 - DCj/We, (10) 

where the pixels / and e are linked by the min- 
imal distance df e . As a consequence, the false 
detection boundary measure can be defined by 
the normalized false detection criteria by: 

FD = 1- exp ^- maX/(F ^ ne/>EeW J -(H) 

In order to take into account the local direc- 
tion of the found boundaries as found with the 
GVF, we can redefine DCf in equation © by 
(FDC.(l — BD))f, so we obtain another new false 
detection criteria. 

Here we have described the use of measures FD 
and WDC for one image classified by the algo- 
rithm and another image provided by only one 
expert. In order to evaluate image segmentation 
algorithms on many images we can use a weighted 
sum of these both measures, taking into account 
the image sizes, which can be different for all con- 
sidered images. 

In conclusion of this section, we have described 
two new measures FD and WDC taking into ac- 
count the uncertainty of different experts on the 
seen boundaries. We have to consider these two 
measures together. 

4. ILLUSTRATION 

We present here an illustration of our image 
classification and segmentation evaluation on real 
sonar images. Indeed, underwater environment 
is a very uncertain environment and it is par- 
ticularly important to classify seabed for numer- 
ous applications such as Autonomous Underwater 
Vehicle navigation. In recent sonar works (e.g. 
26 27]), the classification evaluation is made only 
by visual comparison of one original image and 
the classified image. That is not satisfying in 
order to correctly evaluate image classification 
and segmentation. First we present our database 



given by two different experts with different cer- 
tainties. Then, one possible classification ap- 
proach for sonar image is presented. Finally 
the automatic classification and segmentation ob- 
tained by this approach is evaluated with our new 
evaluation method. 

Note that this illustration is presented in order 
to show how our measures work on only one clas- 
sifier. In order to evaluate a classifier, we have to 
compare the results with another classifier or with 
other parametrization of the evaluated classifier. 

4.1. Database 

Our database contains 42 sonar images pro- 
vided by the GESMA (Groupe d'Etudes Sous- 
Marines de 1' Atlantique) . Theses images were ob- 
tained with a Klein 5400 lateral sonar with a res- 
olution of 20 to 30 cm in azimuth and 3 cm in 
range. The sea-bottom depth was between 15 m 
and 40 m. 

The experts have manually segmented these 
images giving the kind of feature visible in a given 
part of the image: sediment (rock, cobble, sand, 
silt, ripple -either horizontal, vertical or at 45 de- 
grees), shadow or other features (typically ship- 
wrecks). All sediments are given with three cer- 
tainty levels (sure, moderately sure or not sure), 
and the boundary between two sediments is also 
given with a certainty (sure, moderately sure or 
not sure). Hence, every pixel of every image is 
labeled as being either a certain type of sediment 
or a shadow, or a boundary with one of the three 
certainty levels. Figure [1] gives an example of such 
a segmentation provided by the expert. 

4.2. Classification approach 

The classification approach is based on super- 
vised classification. In order to teach the classifier 
we have randomly divided the database into two 
parts. On the learning database we have consid- 
ered, on randomly chosen images only, the homo- 
geneous tiles with a 32 x 32 size and with a sure 
or moderately sure certitude level until to get ap- 
proximately the same number of tiles in the learn- 
ing and test databases. On the test database we 
have considered tiles with a 32 x 32 size and a re- 
covering step of 4. On each tile we have extracted 
some features by a wavelet decomposition. 
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The discrete translation invariant wavelet 
transform is based on the choice of the optimal 
translation for each decomposition level. Each 
decomposition level d gives four new images. We 
choose here a decomposition level d = 2. For each 
image P d (the i th image of the decomposition d) 
we calculate three features. The energy is given 
by: 



1 



N M 



V V P d (n,m), 



NM ^ ^ 

n—1 rn—1 



(12) 



where N and M are respectively the number of 
pixels on the rows, and on the columns. The en- 
tropy is estimated by: 



1 



N M 
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and the mean is given by: 
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(14) 



n— 1 m— 1 



Consequently we obtain 15 features (3+4*3). 

The chosen classifier is based on a Support 
Vector Machine. The algorithm used here is de- 
scribed in [28j . It is a one-ws-one multi-class ap- 
proach, and we take a linear kernel with a con- 
stant (7 = 1. 

We have considered only three classes for learn- 
ing and tests: 

- class 1: Rock and Cobble 

- class 2: Ripple in all directions 

- class 3: Sand and Silt 

Hence shadow is not considered and so the classi- 
fication can not be good on tiles with shadow. In 
order to take into account unknown classes, one 
solution is to add a rejected class in the classifier. 
However, as we show farther down, we can also 
take into account this class if the classifier has no 
rejected class. 

The units of the classifier are tiles with a 32 x 
32 size with a recovering step of 4. Hence, we can 
classify tiles with a 4 x 4 size, considering the tile 
of 4 x 4 size in the middle on each tile of 32 x 
32. 



4.3. Evaluation 

Figure [5] shows the result of the classification of 
the same image than the one given in the figure 
[T] Sand (in red) and rock (in blue) are quite 
well classified but ripple (in yellow) is not well 
segmented. The dark blue corresponds to that 
part of the image that was not considered for the 
classification. 




50 100 150 200 250 300 350 400 450 503 550 



Figure 5. Automatic segmented image. 

Just by looking this figure [5] we cannot say 
whether the classification is good or not, and 
any decision stays very subjective. Moreover, the 
classification algorithm could be good for this im- 
age and not for others. So we propose to use our 
measures. The used weights here for the certitude 
are respectively 2/3 for sure, 1/2 for moderately 
sure and 1/3 for not sure. But other weights can 
be preferred according to the application. 

The normalized confusion matrix obtained for 
one randomly partition of the database is given 
by: 



/ 40.51 5.77 53.72 ^ 

19.65 18.79 61.56 

3.51 1.15 95.34 

\ 45.96 12.47 41.57 / 



(15) 



The last line means that there is shadow or other 
parts classified in class 1, 2 or 3. We can note that 
a high proportion of the rock or cobble (class 1) is 
classified as sand or silt (class 3), and most of the 
ripple (class 2) also. Sand and silt, the most com- 
mon kinds of sediments on our images, are very 
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well classified. The vector of good classification 
rate given by [40.51 18.79 95.34 0] and the vector 
of error classification rate given by [41.26 43.84 
28.47 50.00] summarize these results. Whereas 
we have good classification for sand and silt, we 
also a lot of errors because other sediments are 
classified as sand or silt. 

These results are not significant enough in or- 
der to well evaluate the obtained segmentation. 
Our proposed measures, given respectively by the 
equations ([6]) and (fTTj) expressed in percentage, 
are 65.17 for the good detection criteria and 61.35 
for the false alarm criteria, if we consider the di- 
rection based on the GVF the proposed measures 
give 63.11 for good detection criteria and 64.84 
for the false alarm criteria. 

To better illustrate these two last measures, we 
have proceeded to four more randomly partitions. 
We obtain a mean of 63.53 for the good detec- 
tion criteria with 3.37 for the standard deviation 
and a mean of 60.53 for the false alarm criteria 
with 7.72 for the standard deviation. If we con- 
sider the direction based on the GVF, we obtain 
a mean of 60.09 for the good detection criteria 
with 3.13 for the standard deviation and a mean 
of 52.62 for the false alarm criteria with 8.04 for 
the standard deviation. The standard deviations 
show that the good detection criteria is more sta- 
ble than the false alarm criteria. Our two mea- 
sures can well evaluate the good detection and 
the false alarm. When we consider the direction 
based on the GVF, the criteria decrease because 
of the weights given by the directions. Here, the 
deducted segmentation is dependent of the size 
of the tile, in this case it could be better to not 
consider the direction based on the GVF. 

In order to evaluate the classifier approach, all 
these measures have to be compared to the same 
measures calculated for other parameterizations 
or for other classifier algorithms. 

5. CONCLUSION 

We have proposed some new evaluation mea- 
sures for image classification and segmentation in 
uncertain environments. These new evaluation 
measures can take into account the uncertain la- 
bels. The proposed classification evaluation can 



be used for every kind of uncertain elements clas- 
sification and our segmentation evaluation can be 
used for all image segmentation approaches. We 
have shown that a global image classification eval- 
uation must be made by the evaluation of the 
classification and, at the same time, by the eval- 
uation of the produced segmentation. The pro- 
posed confusion matrix take into account the un- 
certainty of the expert and also the inhomoge- 
neous units (e.g. tiles in the case of local image 
classification). Moreover we have defined good 
classification and error classification rates from 
our confusion matrix. The proposed segmenta- 
tion evaluation considers good and false detection 
boundary measures where the subjectivity of the 
expert is considered by the given uncertainty on 
the boundaries. 

The fusion of the information provided by var- 
ious experts in our proposed evaluation approach 
is made after an individual evaluation, which 
means that we fuse our different measures cal- 
culated for each expert. This fusion is made by 
using a simple sum: the uncertainty is consid- 
ered directly in our measures. We can imag- 
ine fusing the information provided by experts 
before the evaluation in order to obtain uncer- 
tain and/or imprecise reality (e.g. defining fuzzy 
zones around the boundaries according to the cer- 
tainty given by the experts). The fusion can be 
made also by belief functions defined from the un- 
certainties. In this case we have to redefine our 
proposed measures. For instance, the reality ob- 
tained by the fusion of experts could be used to 
outperform the learning step of the classification. 
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